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In the Claims: 

Claims 1-34 are pending in this application, and the status of each is listed 

below. 

1 ., (Currently amended) A computer assisted method of auditing a superset 
of training data, the superset comprising examples of documents having one or more 
preexisting category assignments, the method including: 

partitioning the superset into at least two disjoint sets, including a test set and a 
training set, wherein the test set includes one or more test documents and the 
training set includes examples of documents belonging bolong to at least two 
categories; 

automatically categorizing the test documents using the training set; 

calculating a metric of confidence based on results of the categorizing step and 
comparing t he automatic category assignments for the test d ocument s to the 
preexisting category assignments : and 

reporting the test d ocuments and preexisting c ategory assignments that are 
suspicious and the automatic category assignments t hat appear to be missing from 
the test documents, based on the metric of confidence. 

2. (Original) The method of claim 1 , further including repeating the 
partitioning, categorizing and calculating steps until at least one-half of the 
documents in the superset have been assigned to the test set. 

3. (Original) The method of claim 2, wherein the test set created in the 
partition step has a single test document 

4. (Original) The method of claim 2, wherein the test set created in the 
partition step has a plurality of test documents. 

5. (Original) The method of claim 1 , further including repeating the 
partitioning, categorizing and calculating steps until substantially all of the 
documents in the superset have been assigned to the test set. 
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6.. (Original) The method of claim 1 , wherein the partitioning, categorizing 
and calculating steps are carried out substantially without user intervention. 

7- (Original) The method of claim 5, wherein the partitioning, categorizing 
and calculating steps are carried out substantially without user intervention. 

8. (Original) The method of claim 1 , wherein the partitioning, 
categorizing, calculating and reporting steps are carried out substantially without 
user intervention. 

9. (Original) The method of claim 5, wherein the partitioning, 
categorizing, calculating and reporting steps are carried out substantially without 
user intervention. 

1 0.. (Original) The method of claim 1 , wherein the categorizing step 
includes determining k nearest neighbors of the test documents and the calculating 
step is based on a k nearest neighbors categorization logic. 

1 1 . (Currently amended) The method of claim 1 0, wherein the metric of 
confidence is an unweighted measure of distance between the a particular test 
document and the examples of documents belonging to various categories. 

1 2. (Currently amended) The method of claim 1 1 , where the unweighted 
measure includes application of a relationship ^(d,,^) = ^ s(d t7 d) 9 wherein 

Q 0 is a function of the particular test document represented by the a feature 
vector d t and of various categories T m ; and 

s is a metric of distance between the particular test document feature vector 
d t and certain sample documents represented by feature vectors d, the 
certain sample documents being among a set of k nearest neighbors of the 
particular test document having category assignments to the various 
categories T m . 

13- (Currently amended) The method of claim 10, wherein the metric of 
confidence is a weighted measure of distance between the a particular test 
document and the examples of documents belonging to various categories, the 
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weighted measure taking into account the density of a neighborhood of the test 
document. 

14. (Currently amended) The method of claim 13 . whereirj whore the 
weighted measure includes application of a 

Z v < d i> d i) 

telationshipa x (d t ,r M ) = ^ ( *y r »* , wherein 

£2t is a function of the test document represented by the a feature vector dt 
and of various categories 7^ and 

s is a metric of distance between the test document feature vector d t and 
certain sample documents represented by feature vectors and d 2 , the 
certain sample documents di being among a set of k nearest neighbors of the 
test document having category assignments to the various categories T m and 
the certain sample documents d 2 being among a set of k nearest neighbors of 
the test document. 

1 5. (Currently amended) The method of claim 1 , wherein the i d e nt i fying 
reporting step further includes filtering the test documents based on the metric of 
confidence.. 

16. (Currently amended) The method of claim 15, wherein the filtering , 
step further includes color coding the identified test documents based on the metric 
of confidence. 

1 7. (Currently amended) The method of claim 1 5, wherein the filtering 
step further includes selecting for display the identified test documents based on the 
metric of confidence. 

1 8. (Currently amended) The method of claim 1 , wherein the us e r 
i ntorfaco is reporting in c ludes generating a printed report. 
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1 9. (Currently amended) The method of claim 1 , wherein th e u s er interfac e-is 
reporting includes generating a file conforming to XML syntax- 

20. (Currently amended) The method of claim 1 , wherein th e u se r int e rfac e i s 
reporting includes oeneratino a sorted display identifying at (east a portion of the test 
documents. 

21 . (Original) The method of claim 1 , further including calculating a precision 
score for the identified test documents. 

22.. (Currently amended) A computer assisted method of auditing a superset 
of training data, the superset comprising examples of documents having one or more 
preexisting c ategory assignments, the method including: 

determining k nearest neighbors of the documents in a tes^ subset automatically 
partitioned from t he superset; 

automatically categorizing the documents based on the k nearest neighbors into . 
a plurality of categories; 

calculating a metric of confidence based on results of the categorizing step and 
comparing t he automatic category assignments for the documents Jgjte 
preexisting categ or y assignments : and 

reporting the documents in the test subset and preexis ting category assignments 
that are suspicious and the automatic category assignments that appear to be 
missin g froffi the documents in the test subset , based on the metric of 
confidence. 

23. (Original) The method of claim 22, wherein the metric of confidence is an 
unweighted measure of distance between the test document and the examples of 
documents belonging to various categories. 

24. (Original) The method of claim 23, where the unweighted measure 
includes application of a relationship S^(d i7 T m ) = ][] a(d t ,d) f wherein 

Q 0 is a function of the test document represented by the a feature vector d t and 
of various categories T m ; and 
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s is a metric of distance between the test document feature vector d t and certain 
sample documents represented by feature vectors d, the certain sample 
documents being among a set of k nearest neighbors of the test document 
having category assignments to the various categories T m . 

25. (Original) The method of claim 22, wherein the metric of confidence is a 
weighted measure of distance between the test document and the examples of 
documents belonging to various categories, the weighted measure taking into account 
the density of a neighborhood of the test document. 

26. (Original) The method of claim 25, wherein the weighted measure 
includes application of a relationship Q> l (d i JJ = dMK ^ rT, °\ - — , wherein 

djElf(d ft ) 

&i is a function of the test document represented by the a feature vector d t and 
of various categories T m \ and 

s is a metric of distance between the test document feature vector dt and certain 
sample documents represented by feature vectors di and d 2 , the certain sample 
documents di being among a set of k nearest neighbors of the test document 
having category assignments to the various categories T m and the certain sample 
documents d 2 being among a set of k nearest neighbors of the test document. 

27. (Original) The method of claim 22, wherein the determining, categorizing 
and calculating steps are carried out substantially without user intervention. 

28- (Currently amended) The method of claim 22, wherein the idontifying 
reporting step further includes filtering the documents based on the metric of 
confidence. 

29. (Currently amended) The method of claim 28, wherein the filtering step 
further includes color coding the ident i fi e d reported documents based on the metric of 
confidence. 
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30. (Currently amended) The method of claim 28, wherein the filtering step 
further includes selecting for display the i dentif i ed reported documents based on the 
metric of confidence. 

31 . (Currently amended) The method of claim 22, wherein th e us e r interfaco 
te reporting includes generating a printed report. 

32. (Currently amended) The method of claim 22, wherein tho usor intorfaco 
is reportin g includes generating a file conforming to XML syntax. 

33. (Currently amended) The method of claim 22, wherein tho usor intorfaco 
is reporting incfudes ge nerating a sorted display identifying at least a portion of the 
documents. 

34. (Currently amended) The method of claim 22, further including 
calculatinga precision score for the id e ntified reported documents. 
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